Add support for writing deletion vectors in Delta Lake#22102
Conversation
72af1af to
8ea6475
Compare
a55bc53 to
7d52f10
Compare
7d52f10 to
e3ce7cb
Compare
e3ce7cb to
8f8cddc
Compare
4152170 to
8ab2964
Compare
5242022 to
42769ca
Compare
37eaaa8 to
c9bc2da
Compare
c9bc2da to
70ead35
Compare
70ead35 to
b2de0c2
Compare
| deletedRows.or(deletion.rowsDeletedByDelete()); | ||
| deletedRows.or(deletion.rowsDeletedByUpdate()); | ||
|
|
||
| TrinoInputFile inputFile = fileSystem.newInputFile(Location.of(path.toStringUtf8())); |
There was a problem hiding this comment.
Do we know size of deletion vector in advance from io.trino.plugin.deltalake.transactionlog.DeletionVectorEntry#sizeInBytes or any other metadata ?
If we do, then using it in newInputFile would probably save a FS call.
There was a problem hiding this comment.
This inputFile is a Parquet file, not deletion vector. I think that's possible, but it requires some refactoring. Let me handle in a follow-up.
6883a04 to
a1587b5
Compare
There was a problem hiding this comment.
When we write a new deletion vector, is it mandatory as per the spec that it should be a union of all deleted/updated rows so far, or is that just how we've implemented it ?
Is cleanup of old deletion vectors something already handled by some optimize/vaccum procedure in Trino ?
There was a problem hiding this comment.
It's mandatory for compatiblity with Delta Lake. Otherwise, they return the wrong results if I remember correctly.
The cleanup should be handled in #22809
a1587b5 to
37622e1
Compare
37622e1 to
e9ebfcb
Compare
|
Addressed comments. |
| RoaringBitmapArray deletedRows = loadDeletionVector(Location.of(path.toStringUtf8())); | ||
| deletedRows.or(deletion.rowsDeletedByDelete()); | ||
| deletedRows.or(deletion.rowsDeletedByUpdate()); |
There was a problem hiding this comment.
There was a discussion with @findepi about whether If the amount of rows deleted from file depasses a certain quota, we should consider to proactively rewrite the file.
Is there a potential follow-up here?
There was a problem hiding this comment.
It's a potential follow-up task. I don't expect we will handle it shortly though.
Description
Fixes #17063
Release notes
(x) Release notes are required, with the following suggested text: